Goto

Collaborating Authors

 active bias


Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Neural Information Processing Systems

Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold. Extensive experimental results on six datasets show that our methods reliably improve accuracy in various network architectures, including additional gains on top of other popular training techniques, such as residual learning, momentum, ADAM, batch normalization, dropout, and distillation.


Reviews: Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Neural Information Processing Systems

The paper proposes a novel way of sampling (or weighing) data-points during training of neural networks. The idea is, that one would like to sample data-point more often which could be potentially classified well but are hard to learn (in contrast to outliers or wrongly labeled ones). To find' them the authors propose two (four if split into sampling and weighing) schemes: The first one (SGD-*PV) proposes to weigh data-points according to the variance of the predictive probability of the true label plus its confidence interval under the assumption that the prediction probability is Gaussian distributed. The second one (SGD-*TC), as far as I understand, encodes if the probability of choosing the correct label given past prediction probabilities is close to the decision threshold. The statistics needed (means and variances of p) can be computed on-the-fly during a burn-in phase of the optimizer; they can be obtained from a forward pass of the network which is computed anyways.


Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Neural Information Processing Systems

Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold. Extensive experimental results on six datasets show that our methods reliably improve accuracy in various network architectures, including additional gains on top of other popular training techniques, such as residual learning, momentum, ADAM, batch normalization, dropout, and distillation. Papers published at the Neural Information Processing Systems Conference.


Carpe Diem, Seize the Samples Uncertain "At the Moment" for Adaptive Batch Selection

arXiv.org Machine Learning

The performance of deep neural networks is significantly affected by how well mini-batches are constructed. In this paper, we propose a novel adaptive batch selection algorithm called Recency Bias that exploits the uncertain samples predicted inconsistently in recent iterations. The historical label predictions of each sample are used to evaluate its predictive uncertainty within a sliding window. By taking advantage of this design, Recency Bias not only accelerates the training step but also achieves a more accurate network. We demonstrate the superiority of Recency Bias by extensive evaluation on two independent tasks. Compared with existing batch selection methods, the results showed that Recency Bias reduced the test error by up to 20.5% in a fixed wall-clock training time. At the same time, it improved the training time by up to 59.3% to reach the same test error.